- Task-based learning.
- Quick start + References.
- Plan: Skim through some examples and work closely on two of them.
7 March 2016
Exploratory analysis
Confirmatory analysis
Data: Copenhagen Reinsurance 2167 fire losses records from 1980 to 1990.
library(MASS) model <- fitdistr(danish, 'lognormal')
Data: Death numbers, ~100 features, a few K datapoints.
lognormal_model <- fitdistr(danish, 'lognormal') gamma_model <- fitdistr(danish, 'gamma') print(c(lognormal_model$loglik, gamma_model$loglik))
## [1] -4433.891 -5243.027
Hard to understand what they actually mean!
Here is the output of a linear regression model.
## ## Call: ## lm(formula = y ~ x) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.202860 -0.140567 -0.003021 0.141850 0.201984 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 0.003066 0.016315 0.188 0.851 ## x 0.998149 0.009412 106.054 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.1419 on 299 degrees of freedom ## Multiple R-squared: 0.9741, Adjusted R-squared: 0.974 ## F-statistic: 1.125e+04 on 1 and 299 DF, p-value: < 2.2e-16
plot(x,y, type = 'l') lines(x, fitted(lm_model), col = 'red', lwd = 2)
res <- y - fitted(lm_model) plot(x, res, type = 'l', ylim = c(-0.4, 0.4))
Fairly clear we have missed something.
?COMMAND_NAME, e.g.
?max
install.packages("PACKAGE_NAME"), e.g.
install.packages("d3heatmap")
"2013 - current - future" regime:
\(\\\)
"2016 - future" regime :
"Data sets come in many formats…but R prefers just one." - Garrett Grolemund
(See other slides.)
Reference: data wrangling with R https://www.rstudio.com/resources/webinars/data-wrangling-with-r-and-rstudio/
library(ggplot2) data(diamonds) library(tabplot) tableplot(diamonds)
Reference: http://www.jds-online.com/file_download/379/JDS-1108.pdf
load('input_output//cleaned_data', verbose = T)
library(corrplot)
correlation_matrix <- cor(data1[,10:30])
corrplot(correlation_matrix, order = "hclust", addrect = 8)
## Loading objects: ## data1
Always have your cheatsheets right next to you.
Do not over-customise your graphics
References:
We are going to create a plot similar to this:
Specialized packages:
General packages:
Two things you need to construct a basic map:
load('input_output//leaflet_data')
head(map_data, 5)
## station_name latitude longitude station pressure time trend_coeff ## 1 PUNTA ARENAS -53.00 -70.85 85934 2000 noon -0.276 ## 2 MARAMBIO -64.23 -56.72 89055 2000 noon -0.060 ## 3 SYOWA -69.00 39.58 89532 2000 noon -0.666 ## 4 AMUNDSEN-SCOTT -90.00 0.00 89009 2000 night -0.191 ## 5 NOVOLAZARAVSKAJA -70.77 11.83 89512 2000 night -0.648
library(magrittr) #for the %>% pipeline operator library(leaflet) leaflet(data = map_data) %>% addTiles() %>% addMarkers(~longitude, ~latitude, popup = ~as.character(station_name))
leaflet(data = map_data) %>% addProviderTiles("Stamen.Toner") %>%
addMarkers(~longitude, ~latitude, popup = ~as.character(station_name))
data(iris) DT::datatable(iris)
library(DT)
datatable(iris, options = list(pageLength = 6)) %>%
formatStyle('Sepal.Width',
backgroundColor = styleInterval(3, c('orange', 'white')))
Usage: explore data
data(mtcars) #built-in dataset in R library(d3heatmap) d3heatmap(mtcars, scale = "column", colors = "Spectral")
Usage: explore correlation between predictors
load('input_output//cleaned_data', verbose = T)
library(d3heatmap)
correlation_matrix <- cor(data1[,2:50])
d3heatmap(correlation_matrix, dendrogram = 'none')
## Loading objects: ## data1
Supports: Matlab, Python, R, Javascript, Excel and Others.
Reference:
Five components:
Next week: ShinyR and Tableau